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• Premise of the study: One of the many advantages offered by automated palynology systems is the ability to vastly increase the 
number of observations made on a particular sample or samples. This is of particular benefit when attempting to fully quantify 
the degree of variation within or between closely related pollen types. 

• Methods: An automated palynology system (Classifynder) has been used to further investigate the variation in pollen morphol- 
ogy between two New Zealand species of Myrtaceae (Leptospermum scoparium and Kunzea ericoides) that are of significance 
in the New Zealand honey industry. Seven geometric features extracted from automatically gathered digital images were used 
to characterize the range of shape and size of the two taxa, and to examine the extent of previously reported overlap in these 
variables. 

• Results: Our results indicate a degree of overlap in all cases. The narrowest overlap was in measurements of maximum Feret 
diameter (MFD) in grains oriented in polar view. Multivariate statistical analysis using all seven factors provided the most 
robust discrimination between the two types. 

• Discussion: Further work is required before this approach could be routinely applied to separating the two pollen types used in 
this study, most notably the development of comprehensive reference distributions for the types in question. 

Key words: automated palynology; Classifynder; geometric features; Kunzea ericoides; Leptospermum scoparium; Myrta- 
ceae; pollen morphology. 



Interpretation of palynological data is often limited by taxo- 
nomic resolution. Numerous examples exist of genera or even 
whole families where minimal morphological variation pre- 
cludes distinguishing pollen of individual taxa from one another, 
resulting in all types being lumped together into one class. This 
confounds accurate interpretation of pollen spectra (Birks and 
Birks, 2000; Punyasena et al., 2012). Recent advances in imag- 
ing technology and analysis have led to some progress in dif- 
ferentiating between notorious problem types (e.g., Sivaguru 
et al, 2012; Mander et al, 2013). However, it is likely that in 
many cases, even if differences between two (or more) closely 
related taxa can be identified, there remains a degree of overlap 
that prevents firm classification of every grain. Furthermore, in 
the majority of pollen morphology studies that have been under- 
taken both recently and throughout the history of the discipline, 
keys and criteria for differentiation are typically based on a rela- 
tively limited number of observations on pollen grains from a 
similarly limited number of individuals. This is an extremely 
small sample of a population that is typically in the order of 
thousands of pollen grains per flower, thousands of flowers per 
individual, and thousands to millions of individuals per species. 
This of course results from the practical limitations associated 
with manually measuring individual grains on a microscope 
and/or microscope imagery. In cases where differentiation 
is particularly important, attempts have been made to more 

Manuscript received 30 March 2014; revision accepted 22 July 2014. 
3 Author for correspondence: k.holt@ massey. ac.nz 

doi:10.3732/apps. 1400032 



comprehensively investigate the range in overlap (in features 
such as size, surface texture, and pore number), followed by ap- 
plication of various statistical techniques to fit distributions of 
these parameters of a sample of unknown composition to deter- 
mine the contribution from each of the closely related taxa (e.g., 
Hansen and Cushing, 1973; Gordon and Prentice, 1977). How- 
ever, the requirement for large numbers of measurements on 
many pollen grains makes these studies extremely time consum- 
ing, and therefore seldom performed (Birks and Birks, 1980). 

Advances in digital imaging technology, image processing, 
and computer power are increasingly being harnessed for paly- 
nological research (Holt and Bennett, 2014). Their potential for 
application in the types of studies described above is obvious. At 
the most basic level, they can be employed to generate the large 
data sets of observations and measurements needed for accurate 
assessment of the existence and scope of overlap between taxa. 
Ideally, they can provide automated identification of individual 
pollen grains (with associated probabilities). This paper presents 
a pilot study example illustrating the former application, using 
an existing automated palynology system on a pollen differen- 
tiation problem relevant to the New Zealand honey industry. 

The pollen of the Myrtaceae family are generally regarded as 
challenging to distinguish on the basis of morphology (Erdtman, 
1952; Pike, 1956; Mclntyre, 1963). Among the members of this 
family present in New Zealand are Leptospermum scoparium 
J. R. Forst. & G. Forst. and Kunzea ericoides (A. Rich.) Joy 
Thomps. (formerly L. ericoides A. Rich.). These two taxa are of 
particular significance because of their nectar contribution to 
honey. Honey derived from the nectar of L. scoparium (manuka) 
attracts a premium price because of its purported health benefits 
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(Russell et al., 1990; Allen et al., 1991), while honey derived 
from Kunzea (kanuka) is generally regarded as less beneficial 
(but see recent work by Gannabathula et al, 2012). Melissopal- 
ynological analysis is routinely applied in determining the floral 
origins of honey (Petersen and Bryant, 2011); however, its ap- 
plication to manuka honey is somewhat limited because of the 
similarity in these two key pollen types. At present, pollen ana- 
lyses can only give the proportion of grains that are the "manuka/ 
kanuka type," and this value must be above a certain percentage 
to be accepted for the honey to be considered manuka (Moar, 
1985; Mildenhall and Tremain, 2005). In many cases, however, 
much of these grains may, in fact, be of Kunzea (kanuka). Chem- 
ical tests are often employed for validating claims of manuka 
honey authenticity, but these also have demonstrated limitations 
(Ministry for Primary Industries, 2013). Therefore, the ability to 
better separate these two pollen types is highly desirable. 

Previous studies on pollen of New Zealand L. scoparium and K. 
ericoides (Mclntyre, 1963; Harris et al, 1992; Moar, 1993) report 
very little morphological variation between the two taxa. Both are 
monad, isopolar, angulaperturate, syncolpate, tectate with tectum 
smooth or faintly regulate, triangular in shape in polar view, and 
amb concave to convex (Moar, 1993). A slight difference in over- 
all size has been observed, with grains of L. scoparium generally 
being larger than K. ericoides, but with some degree of overlap in 
the size ranges. Also, shape in equatorial view varies slightly, with 
L. scoparium oblate and K. ericoides oblate to peroblate (for defi- 
nitions of pollen morphology terminology, see Punt et al., 2007). 
Overall, they are regarded as indistinguishable (Moar et al., 201 1). 

Following the point made earlier, these studies are based on a 
relatively limited number of observations. For example, Mclntyre 
(1963) used between 10 and 50 observations, and Moar (1993) 
even fewer. Therefore, we lack an appreciation of the true range 
of this size difference. More comprehensive characterization of 
shape and size parameters through examining a much larger sam- 
ple size of both individual plants and pollen grains is required if 
maximum potential for differentiating between the two types is to 
be realized. To demonstrate the latter aspect (large volume of 
grains), a Classifynder automated palynology system (see www. 
classifynder.com, and Holt et al., 2011) was used to collect a 
large number of images of pollen grains from two individuals 
each of L. scoparium and K. ericoides. From these images, values 
of several geometrical parameters were automatically extracted 
and used as measurements of shape and size. These parameters 
are common to image processing, but not to traditional pollen 
morphology. They may therefore offer new opportunities for 
classification, where other standard palynological features (such 
as aperture number, surface texture, and polar axis length) have 
proved redundant for these types. The equivalent data were also 
collected from one sample of manuka honey and compared with 
the data from the plant pollen in an effort to illustrate what pro- 
portion of the pollen likely came from either species. 

It should be acknowledged that size is regarded by many as 
the least reliable morphological parameter of pollen grains, due 
to the demonstrated potential for variation with age and labora- 
tory treatment methods (Reitsma, 1969; Faegri et al., 1989; 
Flenley, 2003). However, it is apparently the only parameter 
with any potential for variation available to test in this case. All 
attempts have been made to minimize any change in size result- 
ing from laboratory treatments, and where relevant, other causes 
of size change have been addressed. 

At this point, it is necessary to address the rather obvious 
question: why not simply use the Classifynder' s classifier to dis- 
tinguish the two types? The answer to this is that a key aim of the 
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paper is to illustrate an alternate application of the Classifynder, 
i.e., as a tool for investigating pollen morphology, as opposed to 
just counting and classifying. This work is essentially only har- 
nessing the automated pollen detection and imaging capability 
of the Classifynder for automatic generation of large data sets. 

In cases such as this (i.e., manuka vs. kanuka), such studies 
into basic morphology will likely prove to be crucial to accurate 
automatic classification. The training files that are the basis for 
artificial classifiers must be truly representative of the taxa they 
are trying to classify if there is to be any hope for accuracy. In 
many cases, pollen morphology is known to vary within a spe- 
cies, and therefore the range of this variation must be understood 
to ensure that it is adequately captured in the training set. This 
may require examining pollen from many individuals across the 
full geographical and ecological range of a taxon (Gordon and 
Prentice, 1977). Application of automated technology for image 
capture and making measurements will make such a large task 
much more achievable than with manual methods. 

MATERIALS AND METHODS 

Pollen collection and preparation — Pollen was collected from mature 
anthers of two individuals of L. scoparium and K. ericoides (from here on 
referred to by their common names, manuka and kanuka, respectively), and 
from one sample of monofloral manuka honey. The honey sample contains 
>70% manuka/kanuka-type pollen, as determined by conventional honey pollen 
analysis, qualifying it as monofloral, based on the guidelines of Moar (1985). 
It was also regarded as manuka based on organoleptic properties and on apiarist 
knowledge of location and timing of nectar collection. The pollen was aceto- 
lyzed, suspended in silicone oil, and mounted on glass slides. Acetolysis and 
silicone oil were used because this combination produces optimum images on 
the Classifynder. The process of acetolysis is known to alter the size of pollen 
grains (Reitsma, 1969; Faegri et al., 1989), but it is assumed that the degree of 
such alteration will be consistent across all samples. In addition, with respect to 
size, pollen grains mounted in silicone oil are more stable through time than 
glycerine jelly mounts (Andersen, 1960). The slides were placed on a Classi- 
fynder, which automatically locates and images all pollen grains present on the 
slide (see Holt et al., 2011 for further explanation of Classifynder operation). 
Images of pollen grains oriented in polar and equatorial views were manually 
extracted from the image set produced by the Classifynder (Fig. 1). Five hun- 
dred images of each view per individual were obtained for analysis. Effort was 
made to ensure that minimal time elapsed between sample preparation and 
scanning/image gathering to minimize changes in pollen size, which may have 
occurred with time (Gordon and Prentice, 1977). 

Size and shape measurements — Pollen size is captured through two differ- 
ent measurements: the area of the pollen grain, and maximum Feret diameter 
(MFD). The value for area is determined from the number of pixels in the pollen 
image, converted to um 2 . MFD is defined as the length of the line segment join- 
ing the two points on the perimeter of the particle (pollen) that are farthest away 
from each other. 

Measurements of shape used include (all definitions follow those given by 
National Instruments, 2000): 

Elongation factor = the maximum Feret diameter divided by the length 
of the shortest side of a rectangle surrounding the particle. 

Compactness factor = the area of the particle divided by the product of 
the width and the height of a rectangle surrounding the particle. 

Convex hull = the area of the smallest convex polygon containing all 
points in the particle, divided by the area of the particle. 

Heywood circularity factor = the length of the perimeter of the particle 
divided by the circumference of a circle with the same area as the 
particle. 

Hydraulic radius = the particle area divided by the particle perimeter. 
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Fig. 1. Examples of the Classifynder-generated images used in the study. (A-C) Leptospermum scoparium, equatorially oriented images; (D-F) L. sco- 
parium, polar-oriented images; (G-I) Kunzea ericoides, equatorially oriented images; ( J-L) K. ericoides, polar-oriented images. 



Values for all seven features are automatically calculated by the Classifyn- 
der software during image capture and tagged to each image as metadata. The 
metadata are then exported to a spreadsheet file. The data for the five samples 
(four plants, one honey) are compared through descriptive statistics and frequency 
distributions (Table 1, Fig. 2). Bivariate plots for all possible combinations of 



the seven features were also created for both views to identify any combinations 
that may prove definitive (Fig. 3). 

These basic, relatively qualitative analyses are followed with a formal analysis 
of the data using suitable statistical machine-learning methods, specifically 
multivariate versions of linear discriminant analysis, logistic regression classification, 



Table 1 . Minimum, maximum, mean, and standard deviation (SD) values for each of the seven parameters measured. 







Equatorial view 






Polar view 




Parameter 


Kl 


K2 Ml 


M2 


Kl 


K2 Ml 


M2 



Area (jam 2 ) 






Min. 


57.39 


65.82 


Max. 


90.37 


98.79 


Mean 


73.87 


82.10 


SD 


5.65 


6.16 


MFD (urn) 






Min. 


10.25 


10.63 


Max. 


13.77 


13.64 


Mean 


11.93 


12.32 


SD 


0.56 


0.52 


EF 






Min. 


1.65 


1.63 


Max. 


2.19 


2.22 


Mean 


1.89 


1.83 


SD 


0.10 


0.08 


CF 






Min. 


0.66 


0.71 


Max. 


0.82 


0.80 


Mean 


0.74 


0.76 


SD 


0.03 


0.02 


CHR 






Min. 


1.00 


1.00 


Max. 


1.05 


1.04 


Mean 


1.02 


1.01 


SD 


0.01 


0.01 


HCF 






Min. 


1.02 


1.01 


Max. 


1.13 


1.08 


Mean 


1.06 


1.04 


SD 


0.01 


0.01 


HR 






Min. 


1.90 


2.20 


Max. 


2.56 


2.76 


Mean 


2.29 


2.46 


SD 


0.10 


0.10 



70.16 


73.91 


69.57 


113.75 


138.82 


113.82 


90.69 


108.24 


89.69 


6.33 


12.80 


6.65 


11.84 


11.98 


10.49 


15.04 


16.85 


13.71 


13.33 


14.55 


12.09 


0.63 


0.89 


0.52 


1.60 


1.61 


1.46 


2.28 


2.41 


1.77 


1.93 


2.93 


1.59 


0.10 


0.11 


0.05 


0.68 


0.63 


0.62 


0.81 


0.80 


0.79 


0.74 


0.74 


0.71 


0.02 


0.03 


0.02 


1.00 


1.00 


1.01 


1.04 


1.05 


1.09 


1.02 


1.02 


1.03 


0.01 


0.01 


0.01 


1.02 


1.02 


1.03 


1.09 


1.17 


1.15 


1.05 


1.05 


1.07 


0.01 


0.01 


0.02 


2.19 


2.25 


2.17 


2.91 


3.20 


2.82 


2.56 


2.78 


2.49 


0.10 


0.17 


0.10 



72.88 


86.61 


91.14 


120.23 


120.29 


163.57 


94.15 


101.50 


129.04 


8.58 


6.61 


13.76 


10.70 


12.48 


12.49 


13.93 


15.68 


17.16 


12.09 


13.66 


14.91 


0.52 


0.55 


0.87 


1.44 


1.58 


1.53 


1.76 


1.98 


1.85 


1.56 


1.72 


1.65 


0.05 


0.06 


0.05 


0.65 


0.57 


0.61 


0.78 


0.73 


0.72 


0.73 


0.64 


0.66 


0.02 


0.02 


0.02 


1.00 


1.03 


1.01 


1.07 


1.15 


1.11 


1.02 


1.07 


1.05 


0.01 


0.02 


0.01 


1.02 


1.07 


1.05 


1.10 


1.21 


1.15 


1.05 


1.13 


1.10 


0.01 


0.02 


0.02 


2.24 


2.26 


2.44 


2.89 


2.78 


3.34 


2.60 


2.51 


2.91 


0.12 


0.09 


0.15 



Note: MFD = maximum Feret diameter; EF = elongation factor; CF = compaction factor; CHR = convex hull ratio; HCF = Hey wood circularity factor; 
HR = hydraulic radius. 
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Fig. 2. Relative frequency distribution data for area and maximum Feret diameter (MFD) for the four plant samples. (A) Area, equatorial view; (B) 
MFD, equatorial view; (C) area, polar view; (D) MFD, polar view. 



and a nonparametric (kernel-density estimate) naive Bayes classifier. These 
were performed as follows. 

The seven characteristics of the ith grain are denoted as the vector x\ and the 
class (kanuka = 0, manuka = 1) as the scalar y\ Linear discriminant analysis fits 
two multivariate Gaussian distributions to the x\ one, with density fi(x), to the 
points with y i = 1 , and the other, f 0 (x) to the points with y i = 0. A point x° of 
unknown class is then classified as class 1 if/i(x°) > f 0 (x°), as class 0 otherwise. 
Logistic regression fits the linear regression model 



1-P m 

where p is the probability that the point x belongs to class 1 . Finally, the naive 
Bayes classifier assumes that the characteristics are independent, and thus that 
the multivariate density is the product of the univariate densities. The latter 
can then be estimated nonparametrically using kernel-density estimation. See 
Hastie et al. (2009) for more details on the three methods. 

As the samples Kl and K2 were indistinguishable (see Results), they were 
pooled to give a single sample for kanuka pollen, henceforth referred to as 
sample K. However, the manuka samples Ml and M2 differed significantly, and 
hence the analysis was performed three times — once with each sample, and 
once with the pooled sample. In each case or method, an error rate for classify- 
ing the two (kanuka/manuka) samples was calculated, and the fitted classifier 
was then applied to the honey data to estimate the proportion of manuka. The 
analyses were performed separately for polar- and equatorial-oriented images. 



RESULTS 

Frequency distributions of area and MFD for both polar and 
equatorial views show close similarity in distribution for the 
two samples of kanuka, but significant differences exist be- 
tween the two samples of manuka (Fig. 2). Grains of sample 
Ml are similar in size and overlap significantly with Kl and K2, 
while grains of sample M2 are larger, with much less overlap 
with Kl and K2. Distributions for MFD of polar- oriented im- 
ages show the narrowest range of overlap between the two taxa 
(Fig. 2D). Here, the frequency distributions for samples Kl and 
K2 are virtually identical. 

Frequency distributions of the other shape parameters (not 
shown graphically) presented considerable overlap between 
both taxa in all cases. However, combining pairs of parame- 
ters in bivariate plots has shown some potential for differen- 
tiation (Fig. 3). Again, the greatest degree of separation was 
achieved on the polar-oriented images. Effective combina- 
tions were between Heywood circularity factor, hydraulic 
radius, and MFD. No combination proved useful for equatori- 
ally oriented images. 

Figure 4 shows the frequency distribution of the manuka/ 
kanuka-type grains from the honey sample against the frequency 
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Fig. 3. Biplots of Heywood circularity factor (HCF), maximum Feret 
diameter (MFD), and hydraulic radius (HR) for polar-oriented images. (A) 
HCF vs. MFD, (B) HR vs. MFD, (C) HCF vs. HR. 

distribution for the kanuka and manuka plant pollen samples, 
which have now been combined. If it is assumed that these fre- 
quency distributions are representative of the full range of the 
taxa (which is unlikely, see later discussion), then these data 
can be taken as an indication of what proportion of the manuka/ 
kanuka-type grains from the honey sample are actually manuka 
grains, which are kanuka, and which could be from either type. 



Similarly, plotting the honey grain values over the bivariate 
plots (Fig. 5) also provides a visual indication of what propor- 
tion of the grains are likely to actually be manuka. Such a 
graphical illustration is not easily achieved with the multivari- 
ate analyses. 

Results of the multivariate statistical analyses are presented 
in Table 2. As was the case for the frequency distributions and 
biplots, the polar-oriented images provide far better discrimina- 
tion. The equatorial-oriented images appear to be particularly 
sensitive to the difference in the manuka samples; the error 
rates appear to be a reflection of the overlap in the size distribu- 
tions between sample Ml and samples Kl and K2, and the dif- 
ference between M2 and the kanuka samples. Hence, the 
classifier appears to be working mainly on size, which is 
undesirable. 

In contrast, when using the polar-oriented images, there is 
less discrepancy between the error rates for the different ma- 
nuka samples. In fact, the size-concordant samples are better 
classified than the size-discordant ones, but all of the error rates 
are better. 

The error rates, and indeed the estimated proportion of the 
honey pollen that is manuka, differ little from classifier to clas- 
sifier. In combining the manuka samples, the linear discrimi- 
nant analysis fits a single Gaussian distribution to the combined 
sample, in effect treating the two samples as taken from a 
Gaussian superpopulation. In contrast, the naive Bayes classi- 
fier uses a kernel-density estimate, thus preserving the multi- 
modal nature of the data. However, both linear discriminant 
analysis and the naive Bayes classifier estimate the honey 
pollen to be 96% manuka, which we can thus consider to be 
a robust result. Logistic regression produces an estimate between 
the two, but with a slightly lower error rate, and may thus be the 
best classifier. 



DISCUSSION 

This preliminary study has demonstrated potential for using 
some basic size and shape features extracted from automati- 
cally collected digital images to build a model to aid in clas- 
sification of L. scoparium (manuka) and K. ericoides (kanuka) 
pollen, which differ only in shape and/or size. The single 
factor that gave the apparent best discrimination between the 
two taxa was from MFD measured in polar-oriented images. 
Combinations of MFD, Heywood circularity factor, and hydrau- 
lic radius were also useful, but again only in polar-oriented 
images. Likewise, multivariate classifications based on polar- 
oriented images delivered the lowest error rates. Measure- 
ments made in polar orientation may be more discriminative 
because Heywood circularity factor and hydraulic radius are 
both essentially measures of circularity. Therefore, variation 
in these is likely to indicate variation in amb shape between 
the pollen types, which can only be effectively captured in 
polar-oriented images. MFD is also likely to be affected by 
amb shape. In grains with a concave amb, MFD will most 
likely lie between two pores/corners of the grain, while in 
grains with a straight to convex amb, MFD would potentially 
lie between a pore and the crest of the limb opposite. Previ- 
ous research has reported variation in shape in equatorial 
view, with L. scoparium oblate, and K. ericoides oblate to 
peroblate (Moar, 1993). This variation has not been clearly 
captured in the data for equatorial images, suggesting it is 
not significant. 
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Fig. 4. Relative frequency distribution data for area and maximum Feret diameter (MFD). Distributions for the two individuals of each taxa have been 
combined to produce a single distribution, and data for the honey sample have been added. (A) Area, equatorial view; (B) MFD, equatorial view; (C) area, 
polar view; (D) MFD, polar view. 



With respect to the honey sample, the frequency distribu- 
tions for area and MFD values of the manuka/kanuka-type 
grains overlapped with the distribution of the two L. scopar- 
ium samples, in particular that of sample M2 (Fig. 4). Without 
any further statistical considerations, these results indicate 
that 40-80% of the pollen grains are from L. scoparium, and 
the bulk of the remainder from either K. ericoides or L. sco- 
parium. These values have been obtained by taking the pro- 
portions of grains that occur within the portion of the 
distribution exclusive to L. scoparium. If it is assumed that 
MFD of polar-oriented images is the most discriminate fea- 
ture, then around 80% of the grains appear to be of L. sco- 
parium. When the manuka/kanuka-type grains are added to 
the biplot combinations (Fig. 5), a similar pattern emerges, in 
that the bulk of the grains plot in the general field defined by 
the two L. scoparium samples. However, rather than being 
spread evenly over this field, the data points fall in the region 
between the Ml and M2 samples. 



Comparing the frequency distribution of the manuka/kanuka- 
type pollen grains against that of the plant pollen samples is an 
extremely simplistic approach. However, it does provide a 
graphical indication of which taxon the bulk of the grains are 
most likely derived from. Such an approach is obviously not 
new (e.g., Hansen, 1947), and also involves some significant 
assumptions about the distributions and limits of the data (Gordon 
and Prentice, 1977). However, the ability to obtain and com- 
pare quantitative data from such a large number of observations 
is virtually impossible in routine honey pollen analysis, and 
therefore still offers some advantages to the status quo. 

Subjecting the data to the statistical analyses yields a numeri- 
cal value for the manuka/kanuka composition of the sample of 
-96% for all three classifiers, with error rates comparable (3- 
4%) across the three classifiers. Agreement between the three 
classifiers lends confidence to the accuracy of this value. This in 
turn implies that the values obtained from the single parameter 
and bivariate analyses are underestimating the proportion of 
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Fig. 5. Biplots of Heywood circularity factor (HCF), maximum Feret 
diameter (MFD), and hydraulic radius (HR) for polar-oriented images. 
Data for the two individuals of each taxa have been combined to produce a 
single distribution, and data for the honey sample have been added. (A) 
HCF vs. MFD, (B) HR vs. MFD, (C) HCF vs. HR. 

L. scoparium grains in the sample. As mentioned earlier, this 
is a pilot study into further clarifying the size and shape varia- 
tion between two morphologically similar taxa using automat- 
ically generated imagery, and testing the usefulness of selected 
geometric parameters in distinguishing the pollen types. Obviously, 



this has involved a number of assumptions, the most significant 
being: 

The four plant samples are representative. 

The subset of images used (i.e., only polar and equatorial 
views) are representative of the range of shape and size of the 
pollen in the samples. 

Results are not biased by changes in pollen shape or size dur- 
ing and after laboratory processing, or by different origins 
(plant vs. honey). 

With respect to the first assumption, it is highly unlikely that this 
is the case. The four individuals used here are far from satisfactory; 
they were simply what was on hand at the time the pilot was con- 
ceived (which fell outside of the flowering period of the two taxa). 
The fact that up to -2% of the manuka/kanuka-type grains from the 
honey sample fall outside the size range (i.e., area and MFD) of the 
four plant samples (Fig. 4) is evidence that the existing data set 
does not fully capture the range of the two taxa. The difference 
between the two L. scoparium samples in nearly all parameters 
measured is further indication that there may be considerable vari- 
ability in this taxon. Before this technique can be applied to more 
conclusively assess the composition of manuka/kanuka-type pol- 
len in honey samples (or other types of samples, for that matter), 
comprehensive reference distributions must be defined. 

Obtaining the material to build these distributions will require 
repeating the analyses described above on a collection of indi- 
viduals that represent the full geographical and ecological range 
of the taxon (Gordon and Prentice, 1977). Building representa- 
tive distributions will be further complicated by the fact that 
Leptospermum and Kunzea are capable of hybridizing (Harris 
et al., 1992), and that there are numerous undescribed species in 
the Kunzea genus in New Zealand. Repeat sampling of the same 
individuals in different years may also be needed to capture year- 
to-year variation in pollen shape and size resulting from different 
growing conditions (e.g., effects of drought). This is a challeng- 
ing task, but again is made more achievable through the appli- 
cation of automated systems to gather the data needed. 

The second assumption is important because, in this case, the 
subset of 1000 grains of the correct orientations represents only 
5-10% of the total number of grains imaged in each sample. 
The techniques applied in this paper cannot be applied to im- 
ages of manuka/kanuka-type grains in any orientation, as this 
just produces a continuum of values. To apply the results to the 
whole sample requires that this subset is representative of the 
full sample. Bias could be expected if pollen size or shape influ- 
enced the final orientation of the grain on the slide. This is 
likely to occur in very large or irregularly shaped pollen grains 
(e.g., saccate grains such as Pinus, Abies, etc.) but not in smaller 
grains like those of K. ericoides and L. scoparium. Neverthe- 
less, there remains a degree of uncertainty around this issue, 
and it is a difficult issue to test. However, further developments 
in imaging and image processing (using either the Classifynder 
or other system) might overcome the need for the oriented images. 
For example, the Classifynder approach of capturing several 
images through the range of focus offers the opportunity to cal- 
culate three-dimensional parameters of shape and size. At pres- 
ent, the system takes the in-focus portion of each image and 
stiches them together to produce a single two-dimensional image 
from which the 43 features are extracted. However, the poten- 
tial exists to extract more information on the three-dimensional 
shape of the pollen. This could then open the door to making 
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Table 2. Error rates and proportion of honey pollen identified as manuka, as determined by the three different classification methods. 
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Polar 














Kvs. Ml 


0.0214 
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0.0188 


0.9058 


0.0240 


0.9040 


K vs. M2 


0.0359 


0.9493 


0.0316 


0.9601 


0.0347 


0.9620 


Kvs.M 


0.0354 


0.9601 


0.0331 


0.9620 


0.0373 


0.9638 


Equatorial 














Kvs. Ml 


0.1485 


0.9276 


0.1478 


0.9291 


0.1625 


0.9323 


K vs. M2 


0.0495 


0.8173 


0.0397 


0.8772 


0.0508 


0.8520 


Kvs.M 


0.1327 


0.9181 


0.1219 


0.9323 


0.1229 


0.9134 



Note: LDA = linear discriminant analysis; K = combined population of Kl and K2; M = combined population of Ml and M2; Ml and M2 as per text. 



measurements on all grains imaged, not just those in the correct 
orientations. The key issue is whether there is a difference in the 
first place, which was one of the aims of this pilot study. 

With respect to the third assumption, as mentioned in the 
Methods section, pollen size and shape are known to change in 
response to laboratory treatments. By treating all samples 
equally, it is hoped that any changes in shape that do occur are 
consistent in magnitude and nature across the five samples. 
However, it is difficult to account for changes in pollen size and 
shape in the honey pollen grains, which may have resulted from 
processing through the bee and time spent suspended in honey. 
Exposure to moisture during this time would have likely caused 
some expansion in the pollen grain. The dehydration associated 
with the acetolysis procedure should have reversed this expan- 
sion. Nevertheless, the results for area and MFD show that the 
honey grains are on average larger than those of the plant sam- 
ples. This could either result from inadequate reference distribu- 
tions (discussed above) or could represent expansion from time 
spent in honey, or a combination of the two. Again, further re- 
search into this area is required. This could best be achieved 
through collecting pollen directly from manuka and kanuka plants 
and comparing it with pollen extracted from honey gathered from 
the same plants or from plants in the same region (this requires that 
pollen morphology of the two taxa does not vary significantly 
within a local population). Restricting processing to the minimum 
steps (i.e., ethanol extraction from honey) followed by mounting, 
in glycerine jelly or a similar medium that requires minimal prepa- 
ration, would limit the opportunity for processing-related shape 
and size changes to occur. Therefore, any significant differences in 
shape and size between the plant and honey pollen samples could 
reasonably be attributed to the effects of "bee processing." 

CONCLUSIONS 

Overall, this investigation reinforces earlier work showing 
that an overlap exists between the two taxa. The use of automati- 
cally generated large data sets has produced a more comprehen- 
sive picture of the size differences between the individuals 
surveyed than previously existed. Applying geometrical mea- 
sures of shape and size, which are not standard in traditional 
pollen morphology studies, has also shown potential in differen- 
tiating pollen types that vary only in size and/or subtly in shape. 
For the purposes of demonstration, the assumption has been 
made that these individuals are representative of their taxon. 
However, this is obviously unlikely, and much more work is 
required to produce the comprehensive reference distributions 
needed. This would also be the case if this methodology were to 



be applied to other morphologically similar pollen types where 
subtle shape and size variations were the only difference. 
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